Expressive Speech Animation Synthesis with Phoneme-Level Controls
نویسندگان
چکیده
This paper presents a novel data-driven expressive speech animation synthesis system with phoneme-level controls. This system is based on a pre-recorded facial motion capture database, where an actress was directed to recite a predesigned corpus with four facial expressions (neutral, happiness, anger and sadness). Given new phoneme-aligned expressive speech and its emotion modifiers as inputs, a constrained dynamic programming algorithm is used to search for best-matched captured motion clips from the processed facial motion database by minimizing a cost function. Users optionally specify ‘hard constraints’ (motion-node constraints for expressing phoneme utterances) and ‘soft constraints’ (emotion modifiers) to guide this search process. We also introduce a phoneme–Isomap interface for visualizing and interacting phoneme clusters that are typically composed of thousands of facial motion capture frames. On top of this novel visualization interface, users can conveniently remove contaminated motion subsequences from a large facial motion dataset. Facial animation synthesis experiments and objective comparisons between synthesized facial motion and captured motion showed that this system is effective for producing realistic expressive speech animations.
منابع مشابه
Automatic Dynamic Expression Synthesis For Speech Animation
Although a large amount of research has been done in speech animation, both 2D and 3D, one shortcoming of current speech animation methods is that they cannot generate dynamic expressions automatically. In this paper, an automatic technique for synthesizing novel dynamic expression for 3D speech animation is presented. After a Phoneme-Independent Expression Eigen-Space (PIEES) is extracted from...
متن کاملData-Driven Speech Animation Synthesis Focusing on Realistic Inside of the Mouth
Speech animation synthesis is still a challenging topic in the field of computer graphics. Despite many challenges, representing detailed appearance of inner mouth such as nipping tongue’s tip with teeth and tongue’s back hasn’t been achieved in the resulting animation. To solve this problem, we propose a method of data-driven speech animation synthesis especially when focusing on the inside of...
متن کاملA 3d audio-visual animated agent for expressive conversational question answering
This paper reports on the ACQA (Animated agent for Conversational Question Answering) project conducted at LIMSI. The aim is to design an expressive animated conversational agent (ACA) for conducting research along two main lines: 1/ perceptual experiments (eg perception of expressivity and 3D movements in both audio and visual channels): 2/ design of human-computer interfaces requiring head mo...
متن کاملPhoneme-level articulatory animation in pronunciation training
Speech visualization is extended to use animated talking heads for computer assisted pronunciation training. In this paper, we design a data-driven 3D talking head system for articulatory animations with synthesized articulator dynamics at the phoneme level. A database of AG500 EMA-recordings of three-dimensional articulatory movements is proposed to explore the distinctions of producing the so...
متن کاملStartegies and results for the evaluation of the naturalness of the LIPPS facial animation system
The paper describes strategy and results for an evaluation of the naturalness of a facial animation system with the help of hearing-impaired persons. It shows perspectives for improvement of the facial animation model, independent on the animation model itself. The fundamental thesis of the evaluation is that the comparison of presented and perceived visual information has to be performed on ba...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Comput. Graph. Forum
دوره 27 شماره
صفحات -
تاریخ انتشار 2008